{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hackathon #4\n",
    "\n",
    "Written by Eleanor Quint\n",
    "\n",
    "Topics: \n",
    "- Convolutional layers\n",
    "- Pooling layers\n",
    "\n",
    "This is all setup in a IPython notebook so you can run any code you want to experiment with. Feel free to edit any cell, or add some to run your own code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We'll start with our library imports...\n",
    "from __future__ import print_function\n",
    "\n",
    "import numpy as np                 # to use numpy arrays\n",
    "import tensorflow as tf            # to specify and run computation graphs\n",
    "import tensorflow_datasets as tfds # to load training data\n",
    "import matplotlib.pyplot as plt    # to visualize data and draw plots\n",
    "from tqdm import tqdm              # to track progress of loops"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this hackathon, we'll be using convolutional layers, which are uniquely suited to working with image data. We'll use a dataset which is much more challenging if you're only using dense layers, [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ds = tfds.load('cifar10', shuffle_files=True) # this loads a dict with the datasets\n",
    "\n",
    "# We can create an iterator from each dataset\n",
    "# This one iterates through the train data, shuffling and minibatching by 32\n",
    "train_ds = ds['train'].shuffle(1024).batch(32)\n",
    "\n",
    "# Looping through the iterator, each batch is a dict\n",
    "for batch in train_ds:\n",
    "    print(\"data shape:\", batch['image'].shape)\n",
    "    print(\"label:\", batch['label'])\n",
    "    break\n",
    "\n",
    "# visualize some of the data, pick randomly every time this cell is run\n",
    "idx = np.random.randint(batch['image'].shape[0])\n",
    "print(\"An image looks like this:\")\n",
    "imgplot = plt.imshow(batch['image'][idx])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Convolutional Layers\n",
    "\n",
    "TensorFlow implements the convolutional layer with [`tf.keras.layers.Conv2D`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D). The function that instantiates the layer has two required arguments: number of filters and filter size. The number of filters will be equal to the number of channels in the output.\n",
    "\n",
    "Important to keep in mind is:\n",
    "\n",
    "1. Input data should be 4-dimensional with shape (batch, height, width, channels), unless the data_format argument is specified.\n",
    "2. Padding is `'valid'` by default, meaning that only filters which lie fully within the input image will be kept. This will make the resulting image slightly smaller than the input. Use `padding='same'` if image size should be preserved.\n",
    "3. Specifying `strides=n` for some n > 1 will result in an output image multiplicatively smaller than the input by a factor of n. Make sure that n isn't greater than the filter size unless you intend to completely ignore part of the input.\n",
    "\n",
    "Let's specify a convolutional classifier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hidden_1 = tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding='same', activation=tf.nn.relu, name='hidden_1')\n",
    "hidden_2 = tf.keras.layers.Conv2D(filters=64, kernel_size=3, padding='same', activation=tf.nn.relu, name='hidden_2')\n",
    "flatten = tf.keras.layers.Flatten()\n",
    "output = tf.keras.layers.Dense(10)\n",
    "conv_classifier = tf.keras.Sequential([hidden_1, hidden_2, flatten, output])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This architecture of a convolutional stack followed by dense layers for classification is pretty typical. It has the major advantages of making spacially local transformations to the image, followed by a global transformation which makes the final classification decision. You might notice as well that the number of filters goes up after each convolution. This is to allow higher layers to learn more specific patterns, so we require more of them than the general patterns of the earlier layers.\n",
    "\n",
    "The number of parameters in each convolution layer can be calculated as `filter_height * filter_width * in_channels * output_channels`, as opposed to dense layers which have `input_size * output_size` parameters. For example, if we're working with CIFAR images, a first layer 3x3 convolution with 32 filters will have `3 * 3 * 3 * 32 = 864` parameters, compared to a dense layer with 32 neurons' `(32 * 32 * 3) * 32 = 98304` parameters. This is a factor of ~114 decrease, significantly smaller!\n",
    "\n",
    "We can use a function of `tf.keras.Sequential` to see more information about the network, but only after the model has been built."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run some data through the network to initialize it\n",
    "for batch in train_ds:\n",
    "    # data is uint8 by default, so we have to cast it\n",
    "    conv_classifier(tf.cast(batch['image'], tf.float32))\n",
    "    break\n",
    "conv_classifier.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the number of channels in each output shape is equal to the number of filters in the channel. Notice also that the Flatten layer changes the shape of the data to only having two dimensions, this allows us to feed it to the Dense layer in the usual way.\n",
    "\n",
    "Let's re-create the network which we used to classify MNIST from Hackathon 2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the model\n",
    "dense_classifier = tf.keras.Sequential([tf.keras.layers.Flatten(), tf.keras.layers.Dense(200), tf.keras.layers.Dense(10)])\n",
    "\n",
    "# Initialize the model\n",
    "for batch in train_ds:\n",
    "    # data is uint8 by default, so we have to cast it\n",
    "    dense_classifier(tf.cast(batch['image'], tf.float32))\n",
    "    break\n",
    "\n",
    "dense_classifier.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This densely connected network has fewer parameters than the convolutional network! If we compare them, the one dense layer in the convolutional network has more paramters than the entire dense network. This is because the representation grows as it passes through the conv network, which is undesirable. We can handle this by summarizing the data with pooling layers. We'll add max pool layers which act similarly to conv layers, but rather than computing a function with a filter, it selects the pixel with the highest value in the window and ignores the rest.\n",
    "\n",
    "Let's respecify a larger conv network than the above, but this time with max pooling layers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hidden_1 = tf.keras.layers.Conv2D(filters=32, kernel_size=3, padding='same', activation=tf.nn.relu, name='hidden_1')\n",
    "hidden_2 = tf.keras.layers.Conv2D(filters=64, kernel_size=3, padding='same', activation=tf.nn.relu, name='hidden_2')\n",
    "pool_1 = tf.keras.layers.MaxPool2D(padding='same')\n",
    "hidden_3 = tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding='same', activation=tf.nn.relu, name='hidden_3')\n",
    "hidden_4 = tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', activation=tf.nn.relu, name='hidden_4')\n",
    "pool_2 = tf.keras.layers.MaxPool2D(padding='same')\n",
    "flatten = tf.keras.layers.Flatten()\n",
    "output = tf.keras.layers.Dense(10)\n",
    "conv_classifier = tf.keras.Sequential([hidden_1, hidden_2, pool_1, hidden_3, hidden_4, pool_2, flatten, output])\n",
    "\n",
    "# Run some data through the network to initialize it\n",
    "for batch in train_ds:\n",
    "    # data is uint8 by default, so we have to cast it\n",
    "    conv_classifier(tf.cast(batch['image'], tf.float32))\n",
    "    break\n",
    "conv_classifier.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With pooling we've created a much bigger network with fewer parameters. Another way to reduce the size of the data is by setting the `stride` parameter of conv layers to be greater than one (typically two). This will have a similar effect to pooling in reducing the size of the representation by a factor of 4.\n",
    "\n",
    "### Homework\n",
    "\n",
    "Re-write your code from hackathon 2 to use convolutional layers and add code to plot a confusion matrix on the validation data.\n",
    "\n",
    "Specifically, write code to calculate a confusion matrix of the model output on the validation data, and compare to the true labels to calculate a confusion matrix with [tf.math.confusion_matrix](https://www.tensorflow.org/api_docs/python/tf/math/confusion_matrix). (For the inexperienced, [what is a confusion matrix?](https://en.wikipedia.org/wiki/Confusion_matrix)) Use the code example from [scikit-learn](https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html) to help visualise the confusion matrix if you'd like as well.\n",
    "\n",
    "On Canvas, submit your python code in a `.py` and your confusion matrix in a `.png` or `.txt`.\n",
    "\n",
    "I'm expecting this to take about an hour (or less if you're experienced). Feel free to use any code from this or previous hackathons. If you don't understand how to do any part of this or if it's taking you longer than that, please let me know in office hours or by email (both can be found on the syllabus). I'm also happy to discuss if you just want to ask more questions about anything in this notebook!\n",
    "\n",
    "### Coda\n",
    "\n",
    "#### Convolutional Filters from the First Layer of ImageNet\n",
    "\n",
    "Interestingly, automatically learned filters often closely resemble [Gabor Filters](https://en.wikipedia.org/wiki/Gabor_filter). The first layer of the original ImageNet network learned the following filters:\n",
    "\n",
    "![](http://smerity.com/media/images/articles/2016/imagenet_conv_kernels.png \"Convolutional Filters from the First Layer of ImageNet\")\n",
    "\n",
    "#### [Visualizing Convolutional Features](https://distill.pub/2017/feature-visualization/)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "dl_notebooks",
   "language": "python",
   "name": "dl_notebooks"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
